Probabilistic characterization of nearest neighbor classifier

نویسندگان

  • Amit Dhurandhar
  • Alin Dobra
چکیده

The k-Nearest Neighbor classification algorithm (kNN) is one of the most simple yet effective classification algorithms in use. It finds major applications in text categorization, outlier detection, handwritten character recognition, fraud detection and in other related areas. Though sound theoretical results exist regarding convergence of the Generalization Error (GE) of this algorithm to Bayes error, these results are asymptotic in nature. The understanding of the behavior of the kNN algorithm in real world scenarios is limited. In this paper, assuming categorical attributes, we provide a principled way of studying the non-asymptotic behavior of the kNN algorithm. In particular, we derive exact closed form expressions for the moments of the GE for this algorithm. The expressions are functions of the sample, and hence can be computed given any joint probability distribution defined over the input-output space. These expressions can be used as a tool that aids in unveiling the statistical behavior of the algorithm in settings of interest viz. an acceptable value of k for a given sample size and distribution. Moreover, Monte Carlo approximations of such closed form expressions have been shown in [6,5] to be a superior alternative in terms of speed and accuracy when compared with computing the moments directly using Monte Carlo. This work employs the semi-analytical methodology that was proposed recently to better understand the non-asymptotic behavior of learning algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Object-to-group probabilistic distance measure for uncertain data classification

Uncertain objects, where each feature is represented by multiple observations or a given or fitted probability density function, arise in applications such as sensor networks, moving object databases and medical and biological databases. We propose a methodology to classify uncertain objects based on a new probabilistic distance measure between an uncertain object and a group of uncertain objec...

متن کامل

Multiple k-Nearest Neighbor Classifier and Its Application to Tissue Characterization of Coronary Plaque

In this paper we propose a novel classification method for the multiple k-nearest neighbor (MkNN) classifier and show its practical application to medical image processing. The proposed method performs fine classification when a pair of the spatial coordinate of the observation data in the observation space and its corresponding feature vector in the feature space is provided. The proposed MkNN...

متن کامل

Comparing pixel-based and object-based algorithms for classifying land use of arid basins (Case study: Mokhtaran Basin, Iran)

In this research, two techniques of pixel-based and object-based image analysis were investigated and compared for providing land use map in arid basin of Mokhtaran, Birjand. Using Landsat satellite imagery in 2015, the classification of land use was performed with three object-based algorithms of supervised fuzzy-maximum likelihood, maximum likelihood, and K-nearest neighbor. Nine combinations...

متن کامل

Bayesian Classifier with K-nearest Neighbor Density Estimation for Slope Collapse Prediction

Heavy rainfall and typhoon oftentimes cause the collapse of hillslopes across mountain roads. Disastrous consequences of slope collapses necessitate the approach for predicting their occurrences. In practice, slope collapse prediction can be formulated as a deterministic classification problem with two class labels, namely “collapse” and “non-collapse”. Nevertheless, due to the criticality and ...

متن کامل

Bayesian based classifier for mining image classes

In this paper, we demonstrate how semantic categories of images can be learnt from their color distributions using an effective probabilistic approach. Many previous probabilistic approaches are based on the Naïve Bayes that assume independence among attributes, which are represented by a single Gaussian distribution. We use a derivative of the Naïve Bayesian classifier, called Flexible Bayesia...

متن کامل

On the Use of Diagonal and Class-Dependent Weighted Distances for the Probabilistic k-Nearest Neighbor

A probabilistic k-nn (PKnn) method was introduced in [13] under the Bayesian point of view. This work showed that posterior inference over the parameter k can be performed in a relatively straightforward manner using Markov Chain Monte Carlo (MCMC) methods. This method was extended by Everson and Fieldsen [14] to deal with metric learning. In this work we propose two different dissimilarities f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Machine Learning & Cybernetics

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013